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While recent experiments with relatively large neural populations show significant beyond- 
pairwise, or higher-order correlations (HOC), the impact of HOC on the network's ability to encode 
information is poorly understood. We investigate how the biophysical properties of neurons in 
networks shape HOC, and how HOC affect population coding. Specifically, we show that input 
nonlinearities similar to those observed in physiology experiments are equivalent to beyond-pairwise 
interactions in spin-glass-type statistical models. We then discuss one such model with parame- 
terized pairwise- and higher-order interactions, revealing conditions under which beyond-pairwise 
interactions increase the mutual information between a given stimulus type and the population re- 
sponses. For jointly Gaussian stimuli, coding performance is improved by shaping output HOC via 
input nonlinearities when neural firing rates are constrained to be sufficiently low. For natural im- 
age stimuli, performance improves for a broader range of firing rates. Our work suggests surprising 
connections between single-neuron biophysics, population activity statistics, and normative theories 
of population coding. 

The number of neurons for which activities can be simultaneously recorded is rapidly increasing [1] . We thus have 
an advancing understanding of the statistics of population activities, like the relative frequencies of co-active neural 
pairs, triplets, etc. In particular, much work has investigated the distributions of simultaneously recorded retinal 
ganglion cell "words" (patterns of binary neural activities) . For some population sizes and stimuli, these distributions 
are well-fit by pairwise maximum entropy (ME) models [2, 3], while in other cases beyond-pairwise interactions are 
evident in the data and models incorporating higher-order correlations (HOC) are needed [4]. Cortical studies yield 
similar observations [5-7]. 

How do these correlations affect population coding? Much work has investigated how pairwise correlations affect the 
population's ability to transmit information [8-16]. Coding studies of higher-order correlations (HOC) are limited, 
but empirical work shows that in some cases, including HOC allows a decoder to recover the stimulus presented to a 
neural population 3 times faster than a decoder with access only to pairwise statistics [4] . Intriguingly, other work [5] 
shows that HOC reduce the mutual information (MI) between the stimuli and resultant population responses. Overall, 
it is not yet well understood how HOC affect signal coding. We take a normative approach that aims to start filling 
this void. 

Encoding model: We generalize the approach of Tkacik et al. [12], and model the activity of a population of neurons 
by a triplet-wise ME distribution, wherein the activities {(?} {oi € {0, 1} is the silent vs. spiking state of neuron %) 
arc distributed as 



Here, /3 specifies the distribution's width, defining neural reliability [12, 17] analogous to the inverse temperature 
of an Ising spin-glass model. The parameters hi, Jij, and 7^ describe the biases, pairwise interactions, and triplet 
interactions, respectively. This distribution is the one that specifies the means, covariances, and 3-pt correlations of 
the activity distribution, while making the fewest possible assumptions about the distribution overall [2, 3, 18-20]. 

As emphasized by [12], this parameterization of p(a) can be interpreted as a static nonlinear input-output neural 
model. Consider the probability of one neuron firing (having <jj = 1), conditioned on the states of the other neurons 
and the biases: 



where g{(3 1 Xi) = (1 + e ^ Xi ) 1 is a sigmoidal function (Fig. 1). The firing probability in one discrete time bin is akin to 
the mean firing rate. Since firing rates that vary sigmoidally with synaptic input are commonly encountered [21-23], we 
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FIG. 1: Encoding model. (A) Each model neuron has a bias hi (blue) determined by external stimuli. Recurrent pairwise 
(Jij, red) and triplet (7yfc, green) interactions further modify the output statistics. Schematic given for N=3 neurons; data 
presented below are for networks of N=10 neurons. (B) The firing rate (spiking probability) of each neuron varies sigmoidally 
with the strength of its input (xi — hi + + X/j<fe j k^i lijk^jO'k)', see text. The steepness of the sigmoid depends on 



interpret the argument (xj) of the sigmoid as the input to a linear- nonlinear model neuron. With no beyond-pairwise 
interactions (jijk = 0), the bias hi and recurrent inputs to the neuron {JijO-j} add; the sigmoidal function of that sum 
determines the firing rate. If 7^ > 0, then when neurons j and k are co-active, the recurrent input to neuron i is 
Jij + Jik + 7yfcj which is larger than the sum of the contributions observed when only one recurrent input is active at 
a time (Jy+ Jik)] these inputs combine super-linearly. Conversely, for 7^ < 0, they combine sub- linearly. Thus, the 
way that synaptic inputs combine maps onto triplet interactions in statistical models of population activity, shaping 
beyond-pairwise correlations. If the recurrent input to neuron i is an arbitrary function of the activities of the other 
neurons, Xj = hi + f({&j^i})> triplet interactions come from the first nonlinear terms in the series expansion of /(•) 
(see Supplemental Information). We will later return to possible biophysical mechanisms behind such nonlinearities, 
although we note that interaction terms can also come from common noise inputs to neurons [24, 25]. We further note 
that, in our model, these "recurrent" interactions are instantaneous, which is not true for physical neurons. Studies 
of HOC in dynamical models, as in Ref . [25] , are an intriguing area for future work. 

When do HOC improve coding?: Having motivated our probability model, we ask when nonlinear operations improve 
coding. To do this, we use the framework introduced by Tkacik et al. [12] to study population coding with pairwise 
interactions. The bias terms in our model are made stimulus-dependent: hi — hf + h®, where /i? is the stimulus- 
independent bias, and hf is the stimulus-dependent one. As reviewed above, the stimuli enter as additive inputs in 
the L-N model, and we define the stimulus distribution by the joint distribution over hf [12]. 

For a given stimulus distribution and reliability /3, we numerically find the h®, Jij, and 7^ that maximize the MI 
between stimuli and responses: 



MI = - 




The first term is the response entropy, and the second term is (minus) the mean entropy of the response conditioned 
on the stimulus (noise entropy). To simplify our calculations, we consider homogeneous parameter values: h® = hP , 
Jij = J, and jijk = 7 Vi,j,k. For consistency, we use permutation-symmetric stimulus distributions. However, for 

any given stimulus example h s , the conditional response distribution will not necessarily be permutation-symmetric. 

We also optimize MI over h° and J while constraining 7 = 0, the triplet forbidden case. Comparing the maximum 
attainable MI with triplet interactions allowed or forbidden, we ascertain when, and how much, their presence improves 
coding. This is related to 3 rd order connected information [18], where one fits both 2 nd and 3 rd order ME models to 
the stimulus-conditioned response distributions and compares the resulting MI. Because we separately optimize the 
2 nd and 3 rd order models, we obtain more conservative estimates of how MI increases due to the 3 rd order interactions. 
We note that, when triplet interactions are allowed, the optimum can still occur for 7 = 0. In this case the maximal 
MI will be equal for networks allowing and forbidding triplet interactions. 

The data shown herein are for networks of N — 10 neurons. This is as large as we can consider while being able to 
numerically optimize our MI function with reasonable speed (see Supplemental Information for methods). We leave 
the study of larger networks, together with those with heterogeneous interactions, for future work. 

For jointly Gaussian stimuli (Fig. 2A) of varying levels of correlation p, triplet interactions confer no coding 
benefit (Fig. 3A,C): even when triplet interactions are allowed, the optimal encoder has 7 = (Fig. 4A). Here, the 



3 




-1 1 2 hS 3 4 5 6 



FIG. 2: Stimuli considered in this work. (A) Marginal histogram of one stimulus, hi, for the jointly Gaussian ensemble. 
(B) One of the natural luminance images used in this work, from the database of Tkacik et al. [28]. The marginal histogram of 
pixel values for this set of images (normalized to have zero mean and unit variance) is skewed (C). To generate our naturalistic 
stimulus ensemble, we randomly draw dectuplets of pixel values with spacing d pixels by placing the template (D) at a random 
location on a randomly chosen image, and setting stimulus values hf to match the pixel values falling under each square. To 
maintain permutation symmetry of our stimulus ensemble, we permute which square corresponds to which stimulus index i for 
each draw. The marginals (A,C) are the same for all stimulus indices i. 



stimulus distribution is symmetric about the mean, and the optimal encoder is on-off symmetric with (a) = 0.5. That 
symmetry maximizes response entropy. When the stimuli are drawn from discrete binary distributions with equal 
probabilities for the two states, we also find that triplet interactions confer no coding advantage (data not shown). 
These observations suggest that for unskewed stimulus distributions, with on-off symmetry in the optimal response 
distribution, triplet interactions are not useful for coding. To seek situations when triplet interactions might be 
beneficial, we break the on-off symmetry. We do this in two different ways, both motivated by the biological problem 
at hand: naturalistic stimulus distributions and constrained firing rates. 

It has long been argued [26, 27] that natural stimulus statistics are the appropriate starting point for understanding 
the peripheral sensory systems - in particular, for asking what neural encoders arc useful for the stimuli experienced 
by the animal. We use as stimuli calibrated luminance images from the database of Tkacik et al. [28] . By drawing 
groups of pixels (Fig. 2) with variable spacing d, we vary the level of correlation between stimulus values: natural 
images have autocorrelation functions that decrease with distance [29], in a manner that is surprisingly independent 
of the occlusion property of objects in those images [30, 31]. Since luminance (or photon count) is non-negative, 
but can be arbitrarily large, this distribution is skewed (Fig. 2C). This skew is not unique to luminance data: the 
distribution of membrane potentials in rat auditory cortical neurons is skewed [32], a feature that could arise from 
even modest synchrony in pooled inputs [32]. Furthermore, the spike-count distribution in dichotomized Gaussian 
models of population activity is similarly skewed [24] . 

For the natural image luminance stimuli, we find that triplet interactions indeed confer a coding advantage. For 
N = 10 cells, this is a 5 — 10% improvement in MI compared to the optimized purely pairwise encoder (Figs. 
3B,D). The advantage is largest for close-by sampled pixels (small d), and at relatively low values of j3 (i.e., relatively 
unreliable neurons) . Natural images have rich beyond-pairwise statistics [27] . Is that why triplet interactions improve 
encoding for natural image stimuli? No: repeating these optimization experiments using linear mixtures of variables 
from skewed Pearson-system marginal distributions as stimuli, we also observed that triplet interactions improved 
coding (data not shown). 

The second way we break the on-off symmetry of our problem is by restricting the firing rates (FR's) of the neurons 
in our networks. Thus far, they have been allowed to take arbitrarily values. Empirically, however, neurons are seen to 
fire infrequently [2, 4, 7, 33, 34], with mean FR's of a few Hz: for 10 — 20 ms time bins [2, 4, 7] this yields (a) ~ 0.01 
— 0.1. Thus, we maximize a Lagrange function C — MI — A (a) that disfavors high FR's [12, 35], similar to the notion 
of sparse coding [35-37]. By varying A, we alter the mean FR of the optimal network [12]. We ask how the MI for 
these optimal networks varies as a function of their mean FR for networks with triplet interactions either allowed or 
forbidden. Intriguingly, for jointly Gaussian stimuli, which are fully defined by pairwise statistics, triplet interactions 
improve coding performance at sufficiently low firing rates (Fig. 3E). The improvement is larger for stronger stimulus 
correlations. For natural image stimuli, the benefits of triplet interactions extend to higher FR (Fig. 3F). 

How do HOC improve coding?: For naturalistic stimuli, the negative (7 < 0) triplet interactions we observed at 
optimality (Fig. 4B) sparsify neural responses by reducing the frequency of multi-spike synchrony in which many 
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FIG. 3: Triplet interactions can improve coding. (A) For jointly Gaussian stimuli with pairwise correlation coefficients of 
0.95 (dashed lines) or (solid lines), encoders with triplet interactions allowed (green) or forbidden (red, 7 = 0) have the same 
coding performance, which increases with neural reliability /3. (B) For natural image stimuli with pixel spacings of d = 32 (solid 
lines) or d = 2 (dashed lines), the triplet-allowed encoder (green) performs better. The shaded regions (similar in thickness 
to the lines) around the lines in panels (A,B) show the standard deviation of the mean MI over 5 trials with different sets of 
random stimuli. (C,D) To summarize how performance gains vary with correlation level, we plot the ratio of the MI for the 
optimal triplet-allowed networks (MI3) to the one for triplet-forbidden networks (MI2, 7 = 0) as a function of f3. The darkest 
curve corresponds to the largest correlation (p = 0.95 for the Gaussian (C) and d — 2 for the natural image stimuli (D)), the 
lightest curve corresponds to the smallest correlation (p = for the Gaussian and d = 32 for the natural image stimuli), and 
intermediate shades correspond to intermediate levels of correlation. (E,F) For j3 = 1.5, we similarly plot the performance 
ratio as a function of mean firing rate for Gaussian (E) and natural image (F) stimuli, in cases with constrained firing rates 
(see text). In order to make a fair comparison, we estimate the MI of the triplet-allowed and triplet-forbidden networks at the 
exact same firing rate (see Supplemental Information for details). 



neurons fire simultaneously. This sparsifying role of triplet interactions agrees with experimental findings [7] and 
mechanistic modeling [24, 25, 44]. Importantly, 7 < is optimal even in the absence of a FR constraint, pointing to 
a richer role in shaping response distributions. 

Following [12], we first note that (at least at small j3 when 7 = 0, and for all f3 when 7^0) the optimal encoders 
have positive J (Figs. 4A,B). Thus, pairwise network interactions reinforce the positive correlations already present 
in the stimulus, which [12] interpret as an error-correcting property: responses tend to be constrained to a smaller 
set of possibilities. This effect can be beneficial only up to a point: for very large positive J, neurons would all fire 
synchronously regardless of the stimulus, sharply reducing response entropy. There is therefore a trade-off between 
the desiderata of error correction (J reinforces correlations) and high response entropy (J opposes correlations). 

Triplet interactions impact the tradeoff in a novel way: 7 < combats multi-spike synchrony, so that response 
entropy can be maintained even with J reinforcing stimulus correlations. Triplet interactions are more suited than 
pairwise ones at specifically suppressing multi-spike states, in line with the observation that 7 < and J > at 
optimality, and not vice versa (see Supplemental Information). The response distributions of optimal encoders with 
triplets allowed (Fig. 4C) or forbidden (Fig. 4D) support this notion: even with no constraints on the FR, the 
triplet-allowed network makes less use of the state in which all neurons are active. Moreover, with triplet interactions 
forbidden, the optimal encoders have smaller J (Fig. 4B), also consistent with the interpretation above. Finally, we 
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FIG. 4: Triplet interactions, when they are beneficial, sparsify the neural representation of stimuli. For jointly 
Gaussian stimuli (p = 0.95) and no firing rate constraint, optimal encoders have no triplet interaction (7, green, A). At low f), 
these optimal encoders have pairwise interactions (J, red) that enhance the input correlations, whereas at high /3 they oppose 
them [12]. The optimal biases h° (blue) cancel the effective mean-field bias from pairwise interactions, h e g = (N — I) J (a). For 
the natural image stimuli (d=2), the optimal encoders have negative triplet interactions (when triplet interactions are allowed: 
solid lines) for all /3 (B) - the encoder parameters do not change sign. When triplet interactions are forbidden (dashed lines), 
the magnitudes of the pairwise interaction and bias are smaller, and do change sign with increasing ft. The solid horizontal 
line (at zero) is to guide the eye. The shaded regions (similar in thickness to the lines) around the lines in panels (A,B) 
show the standard deviation of the mean parameter values over 5 trials with different sets of random stimuli. For other levels 
of stimulus correlation, we find qualitatively similar results (not shown). A comparison of the response distributions of the 
optimal encoders for the natural image stimuli (d=2) with triplet interactions allowed (C) or forbidden (D, 7 = 0) shows that 
the triplet interactions sparsify the responses by reducing the probability of the state in which all neurons are active. 



observed 7 < to be optimal when we constrained the firing rates as well (data not shown). 

Allowing nonzero triplet interactions yields optimized network parameters J and 7 that do not change sign as f3 is 
varied. This stands in contrast to the case of 7 = 0, for which the encoder parameters change sign as j3 is varied (Fig. 
4): at low /?, they reinforce the stimulus correlations, while at high j3, they oppose them. This behavior is dictated 
by the trade-off between noise and response entropies described above [12]. 

What is the biophysical origin of beyond- pairwise interactions?: In our model, higher-order interactions arise from the 
nonlinear combination of recurrent inputs. Neurobiology provides several processes which can affect such nonlinear 
combinations. Even for passive single-compartment neurons (no dendrites), inputs can combine sub-linearly, as follows. 
Synaptic inputs open ion channels, moving the membrane potential towards that ion's reversal potential [38] . Opening 
subsequent channels creates less current as there is less driving force pushing ions through the channel [39] . Dendrites 
have additional properties that yield nonlinearities [39-42]. This allows flexible higher-order interactions: both super- 
and sub-linear dendritic summation are observed when two inputs impinge on the same dendritic branch [43], while 
inputs to separate branches combine linearly. For strong dendritic inputs, the observed integration properties are 
sub-linear [43], corresponding to negative triplet interactions, similar to what we observed (Fig. 4) for optimal 
coding. 

Herein, we considered nonlinear combinations of recurrent inputs. Including 3 rd order terms like hiO~iO~j in our 
log-polynomial probability distribution (Eq. 1), one can model other input nonlinearities. Nevertheless, caution 
is warranted when making mechanistic interpretations of statistical parameters observed in neural data. Pairwise 
interactions in those data do not necessarily reflect synaptic couplings, as there may be common input to both 
neurons from unobserved cells that are the cause the correlation. Similar remarks apply to higher-order interactions, 
which can also be driven by spike-generating nonlinearities [24, 25, 44] and other mechanisms [18, 25]. 

Summary and implications: We have demonstrated that input nonlinearities are equivalent to beyond-pairwise inter- 
actions in spin-glass statistical models of neural population activity, and observed that - under biologically relevant 
conditions - these nonlinearities improve population coding. Normative theories might thus predict differences in the 
summation properties of neurons in networks that arc evolved (or adapted) to encode different types of stimuli, or in 
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networks with different pressures to regulate firing rates. 
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1 Triplet interactions arise from the first non-linear terms in a 
series expansion of the input to our model neuron 

If we let the recurrent input to neuron i be an arbitrary function of the activities of the other 
neurons, Xi = hi + /({cj^i}), triplet interactions arise from the first nonlinear terms in the series 
expansion Xi = hi + }2j,i"ii^j + Y,&ihjVj + V; >; ,.., r; ( /,a ; r>/, + where /;,> '•;,/, are the 
series coefficients, and we have omitted the constant in the expansion. Since Oj G {0, 1}, <t| = 
Gj, and the b^G? terms can be grouped with the a^Gj ones, this yields x i = h i J r J2j^=i Jij a j + 
T,j,k^i lijkVjVk + where = + % and ^ ijk = c ijk . 



2 Numerical Methods 



2.1 Monte Carlo methods and optimization 



The mutual information between the stimuli and responses, 

MI = - X>(«7) log[p(<7)] + J dh s p{h s ) X>W S ) log[p{a\h% 



(1) 



{<?} 



{9} 



involves the sum over all 2 N possible population states, and an integral over the stimulus distri- 
bution of another such sum. This function is not analytically tractable for N = 10 (the network 
size considered in this work) and / or for continuous stimulus distributions. Instead, we use Monte 
Carlo methods to compute the MI. In particular, we define the (un-normalized) frequency function 



(f)(G\h s ) = exp 



(3(h s ■ g + h° cr% + J Yl a i a J + 7 (T i (T j (T k) 

i i<j i<j<k 



(2) 



This (log-polynomial) function can be very quickly evaluated, and to compute the MI, we take a 
large number of stimuli h s from the appropriate distribution, and evaluate the frequencies of each 
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of the 2 N states for each of the stimuli. We then divide the frequencies for each state and stimulus 
by the sum of the frequencies over all states for that stimulus, to get (normalized) conditional 
probabilities: 

p(a\h s ) = 0(a|/T s )/^0(a|/?). (3) 

M 

This normalizing operation can be done quickly using matrix operations in MatLab [1]. Note that, 
if one instead defined the conditional probability for each state (instead of frequencies), then one 
would need to evaluate the partition function (costly) in the calculation of the probability of each 
of the 2 N states. Using the approach of first computing frequencies, we evaluate the partition 
function only once for each stimulus value, saving 2^ — 1 evaluations of the partition function 
for each stimulus example. Given the conditional probabilities, we then compute the conditional 
entropy (for each stimulus), 

H(a\h s ) = -J2p(°\h s ) hg\p(a\h s )}. (4) 
M 

Averaging these values over the set of stimuli from our distribution we get the noise entropy 
(H noise , which is minus the second term in Eq. 1). Similarly, we can average the conditional 
probabilities across all stimulus examples to get the (marginal) response distribution 

p{a) = (p(*|#)) fc - . (5) 

Finally, we compute the entropy of the response distribution 

H resp = ~J2p(°) MM^)] (6) 

M 

and subtract the noise entropy to get the MI: MI = H resp — H noise . 

Note that, since we are using Monte Carlo integration, each evaluation of the MI function involves 
a (potentially) different set of stimuli, and thus a potentially (slightly) different result, even for 
identical network parameters. This noise makes gradient-based optimization methods highly error- 
prone. We avoid this pitfall by using exactly the same set of stimuli in subsequent calls to the MI 
function during the optimization. This common random number approach makes the MI a smooth 
function of our parameters, allowing us to use gradient-based optimization techniques; see [2] for 
an overview of optimization methods for noisy functions. For the optimization itself, we use the 
open-source MinFunc package [3] from Mark Schmidt. We found that MinFunc was much faster 
and more reliable than the minimizers in the MatLab optimization toolbox. 

In this paper, we have used ensembles of 1000 stimulus examples in evaluating the MI function. 
We repeated the optimization 5 times, with different sets of stimuli each time, and found that the 
results were highly reproducible: the standard deviation of the mean MI achieved over those 5 
trials is small (Fig. 3A,B of the main paper) - it is comparable to, or in many cases less than, 
the line width on the plots - as is the standard deviation of the mean parameter values obtained at 
optimality (Figs. 4A,B of the main paper). 

The expressions herein (and in the main paper) do not specify the base in which the logarithm is 
computed. For MI values in bits, those logarithms are to base 2. 
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2.2 Comparing optimal networks with constrained firing rates 



When we use Lagrange multipliers for optimizing MI with constrained firing rates (see main pa- 
per), the exact functional relationship between Lagrange multiplier A and firing rate is unknown: 
although higher Lagrange multipliers lead to lower firing rates, we cannot easily specify what 
value of A is needed to achieve a given firing rate. We use the same values of the Lagrange multi- 
pliers when we optimize with triplet interactions either allowed (TA), or forbidden (TF), resulting 
in (slightly) different mean firing rates for the optimal TA and TF networks. The reason for this 
difference is easy to understand, as they have different MI values, and thus the optimal trade-off 
between MI and firing rate in the Lagrange function C = MI — A (a) will be slightly different. 

We use linear interpolation to estimate the MI of the TF network at the exact mean firing rate of 
the TA network: since we have several points on the curve of MI vs. mean firing rate for the TF 
network, this interpolation is easy to implement. Finally, we take the ratio of the MI value for the 
TA network to the (interpolated) one for the TF network at the same firing rate to create the data in 
Figs. 3E,F. 



3 Triplet interactions are better than pair-wise ones at sup- 
pressing multi-spike states, hence the observation that 7 < 
and J > at optimality, and not vice versa 

Consider the contribution C of the recurrent connections to the log-polynomial probability dis- 
tribution over network states, C = JJ2i<j CiCj + lJ2i<j<k CiCjCk- When this number is large, 
the state is favored, and vice versa. The first term (JJ2i<j ffiffj) is -J times the number of active 
neural pairs, which is 0(a 2 ), where a is the number of co-active neurons, while the second term 

(7Ei<j<fc <7iOjO k ) is £>(a 3 ). 

Let us consider situations in which it is desirable for neurons act cooperatively, while not always 
firing synchronously. 

If we choose positive J and (small) negative 7 - which is what we observe at optimality: see 
Fig. 4B of the main paper - then neurons are encouraged to be co-active by the positive J: they 
cooperate. If one considers states with many spikes (large a), however, we can see that the effects 
of the triplet interaction, which are 0(a 3 ), can exceed the pair-wise ones, which are 0(a 2 ). In 
other words, C ~ JO(a 2 ) + r )0(a z ) is a unimodal function with a positive peak, such that for 
large a, states are strongly suppressed, while for intermediate a, they may be facilitated (Fig. 1: 
upper (pink) curve). This acts somewhat like negative feedback: for small a, the effects of J 
(larger in magnitude than 7) dominate, pushing the network towards having co-active cells, while 
for large a, the effects of 7 push the network away from having too many co-active cells. Thus, 
the network produces cooperative responses but has a diminished probability of having all of the 
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Figure 1: Negative triplet interactions are better at suppressing the all-on state than are 
negative pairwise interactions. The plots here show the function C = Ja(a — l)/2 + 7a (a — 

l)(a — 2)/6 for non-negative integer values of a 6 [0, 10]. For J=0.75 and 7 = —0.3 (upper, pink 
curve), the states with many co-active neurons are suppressed by the recurrent interaction. For 
J=-0.75 and 7 = 0.3 (lower, brown curve), the opposite is true. 



neurons co-active. 

Now consider the opposite situation, with negative J, and positive 7. In this case C is unimodal 
with a negative peak, and the larger-ct states are progressively more facilitated by recurrent inter- 
actions (Fig. 1: lower (brown) curve). This is reminiscent of positive feedback, and leads to heavy 
usage of the all-neurons-on state. 

Of course, if we allow A th order terms in the probability model, then one could have positive 7, 
while still avoiding epileptic levels of synchrony, by having negative A th order interactions, for 
example. 
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